TODO: Refine title

Initial Questions

TODO: Must have at least two questions. It is best to have different types of problems, ie one regression, and one classification

Objective

TODO: Analysis: Identify the questions, what is the objective/goal of processing this dataset? What answers are you interested to find through this dataset.
TODO: Determine the details about the dataset (eg. title, year, the purpose of dataset, dimension content, structure, summary) by exploring the raw data.
TODO: Short introduction with objective of the project.

Data Cleaning and Preprocessing

TODO: Which section of the data do you need to tidy?
TODO: Prepare data for analysis by correcting the variables and contents of the data.
TODO: Putting it all together as a new cleaned/processed dataset: For this task, you are also encouraged to explore any cleaning packages in R other than those learned in the course (diplyr, tidyr, lubridate, etc).

Import libraries

# if (!require('dplyr'))
#  install.packages('dplyr', repos='https://cran.asia/');
if (!require('kableExtra'))
  install.packages('kableExtra', repos='https://cran.asia/');
# if (!require('lubridate'))
#   install.packages('lubridate', repos='https://cran.asia/');
if (!require('plotly'))
  install.packages('plotly', repos='https://cran.asia/');
if (!require('plyr'))
  install.packages('plyr', repos='https://cran.asia/');
if (!require('raster'))
  install.packages('raster', repos='https://cran.asia/');
if (!require('scales'))
  install.packages('scales', repos='https://cran.asia/');
# if (!require('tidyquant'))
#   install.packages('tidyquant', repos='https://cran.asia/');
# if (!require('tidyr'))
#   install.packages('tidyr', repos='https://cran.asia/');


# library(dplyr)
library(kableExtra)
# library(lubridate)
library(plotly)
library(plyr)
library(raster)
library(scales)
# library(tidyquant)
# library('tidyr')

Import dataset

# covid_malaysia_endpoint <- 'https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/cases_malasia.csv'
# covid_state_endpoint <- 'https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/cases_state.csv'
covid_malaysia_endpoint <- 'cases_malaysia.csv'
covid_state_endpoint <- 'cases_state.csv'

df <- read.csv(covid_malaysia_endpoint, header=TRUE)
df_state <- read.csv(covid_state_endpoint, header=TRUE)

# Check the structure of the dataframe
str(df)
## 'data.frame':    708 obs. of  31 variables:
##  $ date                   : chr  "2020-01-25" "2020-01-26" "2020-01-27" "2020-01-28" ...
##  $ cases_new              : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_import           : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_recovered        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_active           : int  4 4 4 4 7 8 8 8 8 8 ...
##  $ cases_cluster          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_unvax            : int  4 0 0 0 3 1 0 0 0 0 ...
##  $ cases_pvax             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_fvax             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_boost            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_child            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_adolescent       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_adult            : int  1 0 0 0 2 1 0 0 0 0 ...
##  $ cases_elderly          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_0_4              : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_5_11             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_12_17            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_18_29            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_30_39            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_40_49            : int  1 0 0 0 0 1 0 0 0 0 ...
##  $ cases_50_59            : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ cases_60_69            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_70_79            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_80               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cluster_import         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_religious      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_community      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_highRisk       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_education      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_detentionCentre: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ cluster_workplace      : int  NA NA NA NA NA NA NA NA NA NA ...
str(df_state)
## 'data.frame':    11344 obs. of  25 variables:
##  $ date            : chr  "2020-01-25" "2020-01-25" "2020-01-25" "2020-01-25" ...
##  $ state           : chr  "Johor" "Kedah" "Kelantan" "Melaka" ...
##  $ cases_new       : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ cases_import    : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ cases_recovered : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_active    : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ cases_cluster   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_unvax     : int  4 0 0 0 0 0 0 0 0 0 ...
##  $ cases_pvax      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_fvax      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_boost     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_child     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_adolescent: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_adult     : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ cases_elderly   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_0_4       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_5_11      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_12_17     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_18_29     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_30_39     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_40_49     : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ cases_50_59     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_60_69     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_70_79     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cases_80        : int  0 0 0 0 0 0 0 0 0 0 ...
# Check the dimension of the dataframe
dim(df)
## [1] 708  31
dim(df_state)
## [1] 11344    25
# Check the first 6 rows
head(df) %>% kable('html') %>% kable_styling(font_size = 12)
date cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80 cluster_import cluster_religious cluster_community cluster_highRisk cluster_education cluster_detentionCentre cluster_workplace
2020-01-25 4 4 0 4 0 4 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 NA NA NA NA NA NA NA
2020-01-26 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-27 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-28 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-29 3 3 0 7 0 3 0 0 0 1 0 2 0 1 0 0 0 1 0 1 0 0 0 NA NA NA NA NA NA NA
2020-01-30 1 1 0 8 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 NA NA NA NA NA NA NA
head(df_state) %>% kable('html') %>% kable_styling(font_size = 12)
date state cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80
2020-01-25 Johor 4 4 0 4 0 4 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
2020-01-25 Kedah 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-25 Kelantan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-25 Melaka 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-25 Negeri Sembilan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2020-01-25 Pahang 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Examine the statistics data
summary(df) %>% kable('html') %>% kable_styling(font_size = 12)
date cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80 cluster_import cluster_religious cluster_community cluster_highRisk cluster_education cluster_detentionCentre cluster_workplace
Length:708 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 1 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0.0000 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 14.0
Class :character 1st Qu.: 53.5 1st Qu.: 3.00 1st Qu.: 51 1st Qu.: 1212 1st Qu.: 17.0 1st Qu.: 53.5 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 2.0 1st Qu.: 3.0 1st Qu.: 38.75 1st Qu.: 4.0 1st Qu.: 1.0 1st Qu.: 1.0 1st Qu.: 3.0 1st Qu.: 16.0 1st Qu.: 10.0 1st Qu.: 6.0 1st Qu.: 4.0 1st Qu.: 2.0 1st Qu.: 1.00 1st Qu.: 0 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 43.0 1st Qu.: 4.0 1st Qu.: 2.25 1st Qu.: 2.00 1st Qu.: 172.5
Mode :character Median : 1542.0 Median : 6.00 Median : 1303 Median : 15124 Median : 364.0 Median : 1205.5 Median : 0 Median : 0.0 Median : 0.00 Median : 122.0 Median : 68.0 Median : 1156.50 Median : 93.5 Median : 46.5 Median : 75.0 Median : 68.0 Median : 466.5 Median : 391.5 Median : 197.0 Median : 119.0 Median : 64.5 Median : 22.00 Median : 8 Median : 0.0000 Median : 4.00 Median :137.5 Median : 16.0 Median : 16.50 Median : 37.00 Median : 494.5
NA Mean : 3900.4 Mean : 12.21 Mean : 3798 Mean : 45991 Mean : 693.7 Mean : 2389.9 Mean : 557 Mean : 942.0 Mean : 11.48 Mean : 522.4 Mean : 257.1 Mean : 2666.92 Mean : 348.9 Mean : 208.0 Mean : 314.4 Mean : 257.1 Mean :1006.1 Mean : 820.5 Mean : 494.7 Mean : 345.7 Mean : 223.6 Mean : 92.29 Mean : 33 Mean : 0.4727 Mean : 23.19 Mean :198.8 Mean : 27.3 Mean : 37.53 Mean : 60.47 Mean : 624.4
NA 3rd Qu.: 5299.5 3rd Qu.: 13.00 3rd Qu.: 5119 3rd Qu.: 63406 3rd Qu.:1088.8 3rd Qu.: 3322.8 3rd Qu.: 92 3rd Qu.: 290.2 3rd Qu.: 0.00 3rd Qu.: 744.8 3rd Qu.: 291.5 3rd Qu.: 3596.00 3rd Qu.: 554.5 3rd Qu.: 294.2 3rd Qu.: 450.8 3rd Qu.: 291.5 3rd Qu.:1243.5 3rd Qu.:1136.8 3rd Qu.: 670.2 3rd Qu.: 518.0 3rd Qu.: 364.0 3rd Qu.:149.00 3rd Qu.: 50 3rd Qu.: 0.0000 3rd Qu.: 16.00 3rd Qu.:300.2 3rd Qu.: 41.0 3rd Qu.: 40.00 3rd Qu.: 86.00 3rd Qu.:1049.5
NA Max. :24599.0 Max. :366.00 Max. :24855 Max. :263850 Max. :3394.0 Max. :12684.0 Max. :7318 Max. :8448.0 Max. :305.00 Max. :3437.0 Max. :1820.0 Max. :16450.00 Max. :1986.0 Max. :1362.0 Max. :2091.0 Max. :1820.0 Max. :6374.0 Max. :4922.0 Max. :3132.0 Max. :2066.0 Max. :1231.0 Max. :581.00 Max. :210 Max. :54.0000 Max. :359.00 Max. :825.0 Max. :189.0 Max. :501.00 Max. :439.00 Max. :2338.0
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA’s :342 NA’s :342 NA’s :342 NA’s :342 NA’s :342 NA’s :342 NA’s :342
summary(df_state) %>% kable('html') %>% kable_styling(font_size = 12)
date state cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80
Length:11344 Length:11344 Min. : 0.0 Min. : 0.0000 Min. : 0.0 Min. : -2.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.000 Min. : 0.000
Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 11.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
Mode :character Mode :character Median : 19.0 Median : 0.0000 Median : 16.0 Median : 274.5 Median : 3.0 Median : 14.0 Median : 0.00 Median : 0.00 Median : 0.0000 Median : 1.00 Median : 1.00 Median : 13.0 Median : 1.00 Median : 0 Median : 1.00 Median : 1.00 Median : 4.00 Median : 4.00 Median : 2.00 Median : 2.0 Median : 1.00 Median : 0.000 Median : 0.000
NA NA Mean : 243.7 Mean : 0.7916 Mean : 237.3 Mean : 2874.0 Mean : 43.3 Mean : 149.2 Mean : 34.77 Mean : 58.97 Mean : 0.7377 Mean : 32.64 Mean : 16.06 Mean : 166.6 Mean : 21.81 Mean : 13 Mean : 19.64 Mean : 16.06 Mean : 62.85 Mean : 51.27 Mean : 30.91 Mean : 21.6 Mean : 13.97 Mean : 5.768 Mean : 2.063
NA NA 3rd Qu.: 237.0 3rd Qu.: 0.0000 3rd Qu.: 215.2 3rd Qu.: 2532.5 3rd Qu.: 40.0 3rd Qu.: 134.0 3rd Qu.: 3.00 3rd Qu.: 8.00 3rd Qu.: 0.0000 3rd Qu.: 29.00 3rd Qu.: 12.00 3rd Qu.: 160.0 3rd Qu.: 22.25 3rd Qu.: 11 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 59.00 3rd Qu.: 50.00 3rd Qu.: 29.00 3rd Qu.: 22.0 3rd Qu.: 15.00 3rd Qu.: 6.000 3rd Qu.: 2.000
NA NA Max. :8792.0 Max. :74.0000 Max. :8801.0 Max. :94137.0 Max. :1545.0 Max. :6112.0 Max. :3890.00 Max. :3610.00 Max. :85.0000 Max. :1002.00 Max. :527.00 Max. :6549.0 Max. :637.00 Max. :429 Max. :608.00 Max. :527.00 Max. :2524.00 Max. :2097.00 Max. :1265.00 Max. :699.0 Max. :449.00 Max. :157.000 Max. :62.000

Handle missing/duplicate values

# Check for the columns with missing values
colSums(is.na(df)) %>% kable('html') %>% kable_styling(font_size = 12)
x
date 0
cases_new 0
cases_import 0
cases_recovered 0
cases_active 0
cases_cluster 0
cases_unvax 0
cases_pvax 0
cases_fvax 0
cases_boost 0
cases_child 0
cases_adolescent 0
cases_adult 0
cases_elderly 0
cases_0_4 0
cases_5_11 0
cases_12_17 0
cases_18_29 0
cases_30_39 0
cases_40_49 0
cases_50_59 0
cases_60_69 0
cases_70_79 0
cases_80 0
cluster_import 342
cluster_religious 342
cluster_community 342
cluster_highRisk 342
cluster_education 342
cluster_detentionCentre 342
cluster_workplace 342
colSums(is.na(df_state)) %>% kable('html') %>% kable_styling(font_size = 12)
x
date 0
state 0
cases_new 0
cases_import 0
cases_recovered 0
cases_active 0
cases_cluster 0
cases_unvax 0
cases_pvax 0
cases_fvax 0
cases_boost 0
cases_child 0
cases_adolescent 0
cases_adult 0
cases_elderly 0
cases_0_4 0
cases_5_11 0
cases_12_17 0
cases_18_29 0
cases_30_39 0
cases_40_49 0
cases_50_59 0
cases_60_69 0
cases_70_79 0
cases_80 0
# Show first few rows of the missing values
head(df[rowSums(is.na(df)) > 0,]) %>% kable('html') %>% kable_styling(font_size = 12)
date cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80 cluster_import cluster_religious cluster_community cluster_highRisk cluster_education cluster_detentionCentre cluster_workplace
2020-01-25 4 4 0 4 0 4 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 NA NA NA NA NA NA NA
2020-01-26 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-27 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-28 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA
2020-01-29 3 3 0 7 0 3 0 0 0 1 0 2 0 1 0 0 0 1 0 1 0 0 0 NA NA NA NA NA NA NA
2020-01-30 1 1 0 8 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 NA NA NA NA NA NA NA
head(df[rowSums(is.na(df_state)) > 0,]) %>% kable('html') %>% kable_styling(font_size = 12)
date cases_new cases_import cases_recovered cases_active cases_cluster cases_unvax cases_pvax cases_fvax cases_boost cases_child cases_adolescent cases_adult cases_elderly cases_0_4 cases_5_11 cases_12_17 cases_18_29 cases_30_39 cases_40_49 cases_50_59 cases_60_69 cases_70_79 cases_80 cluster_import cluster_religious cluster_community cluster_highRisk cluster_education cluster_detentionCentre cluster_workplace
# The missing rows for df can be ignored as there are 2020 data. 2021 data contains more columns.
# There is no missing rows for df_state.

# Check for duplicate values
df[duplicated(df)]
## data frame with 0 columns and 708 rows
df[duplicated(df_state)]
## data frame with 0 columns and 708 rows
# There are no duplicated rows

Preprocessing

df$date <- as.Date(df$date, format='%Y-%m-%d')
df_state$date <- as.Date(df_state$date, format='%Y-%m-%d')

Exploratory Data Analysis

TODO: Results may include visualization, prediction, evaluation of models and discussion of output

A brief Look on the graph

fig <- plot_ly(df, type = 'scatter', mode = 'lines')%>%
  add_trace(x = ~date, y = ~cases_new, name = 'Daily New Cvoid Cases')%>%
  layout(showlegend = F)
options(warn = -1)

fig <- fig %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         plot_bgcolor='#e5ecf6', width = 1200)


fig

Density Map

df_state <- df_state %>%
  mutate(date = as.Date(df_state$date, format = "%Y-%m-%d")) %>%
  filter(date == as.Date('2021-09-01')) %>%
  mutate(state = replace(state, state == "W.P. Kuala Lumpur", "Kuala Lumpur")) %>%
  mutate(state = replace(state, state == "W.P. Labuan", "Labuan")) %>%
  mutate(state = replace(state, state == "W.P. Putrajaya", "Putrajaya")) %>%
  arrange(state) %>%
  dplyr::rename(NAME_1 = state)
  
malaysia <- getData("GADM", country = "MYS", level = 1)
malaysia@data$id <- rownames(malaysia@data)
malaysia@data <- join(malaysia@data, df_state, by = "NAME_1")
malaysia_df <- fortify(malaysia)
## Regions defined for each Polygons
malaysia_df <- join(malaysia_df, malaysia@data, by = "id")

theme_opts <- list(theme(
  panel.grid.minor = element_blank(),
  panel.grid.major = element_blank(),
  panel.background = element_blank(),
  plot.background = element_blank(),
  axis.line = element_blank(),
  axis.text.x = element_blank(),
  axis.text.y = element_blank(),
  axis.ticks = element_blank(),
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  plot.title = element_blank()
))

# https://garthtarr.github.io/meatR/ggplot_extensions.html
# https://rstudio-pubs-static.s3.amazonaws.com/160207_ebe47475bb7744429b9bd4c908e2dc45.html
ggplot() +
  geom_polygon(data = malaysia_df, aes(x = long, y = lat, group = group, fill = cases_new), color = "white", size = 0.25) +
  theme(aspect.ratio = 2/5) +
  scale_fill_distiller(name = "No. of New Cases", palette = "YlOrRd", direction=1, breaks = pretty_breaks(n = 5)) +
  labs(title = paste('Number of New Cases in Each State on', '2021-09-01'))

Machine Learning

TODO: Results may include visualization, prediction, evaluation of models and discussion of output

Conclusion

TODO: Conclusion

Presentation and Submission

TODO Report: Submission will be an R markdown published at Rpubs, and the link is to be submitted in spectrum. The R markdown may include the following:

TODO: Only one member per group will submit the report.
TODO: Each group is required to prepare a 10 minute presentation with powerpoint.
TODO: Both group members must present their parts.

End of Report